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The  project  “Statistical  Inferences  from  the  Topology  of  Complex  Net¬ 
works”  was  funded  by  the  AFOSR  under  the  Complex  Networks  program, 
for  the  period  March  1,  2013  to  February  29,  2016.  It  was  granted  a  No 
Cost  Extension  from  March  1,  2016  to  August  31,  2016.  Here  we  offer  a  final 
report  on  the  results  of  this  project  for  the  duration  of  the  grant,  from  March 
1,  2013  to  August  31,  2016. 

Change  in  PI 

The  project  was  awarded  to  Dr.  Peter  Bubenik  at  Cleveland  State  Uni¬ 
versity.  In  August  2015,  Dr.  Bubenik  moved  to  the  University  of  Florida. 
Consequently,  the  PI  on  the  project  was  changed  to  Dr.  John  Holcomb,  the 
Chair  of  the  Department  of  Mathematics  at  Cleveland  State  University.  The 
remaining  balance  on  the  grant  was  sub-awarded  to  Dr.  Bubenik  at  the  Uni¬ 
versity  of  Florida,  where  he  continued  to  work  on  the  project.  All  references 
to  PI  below  are  to  Dr.  Peter  Bubenik. 


Summary  of  the  research 

Main  goals 

As  described  in  the  abstract  of  the  proposal  for  this  project,  its  goals  were  “to 
develop  and  study  a  new  topological  descriptor  that  is  designed  for  statistical 
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inference,  and  to  use  it  for  that  purpose,”  and  “to  develop  this  topological 
machinery  from  a  more  abstract  point  of  view,  providing  a  better  framework 
for  studying  its  stability  and  for  extending  the  scope  of  this  technology.” 

Topological  data  analysis  provides  machinery  for  summarizing  the  topol¬ 
ogy  of  complex  data  as  a  “barcode”  or  “persistence  diagram” .  While  these 
have  been  successful  tools  for  visualization,  they  are  unsuitable  for  further 
statistical  analysis  or  machine  learning.  The  main  goal  of  this  project  was 
to  develop  a  new  summary  compatible  with  statistics  and  machine  learning. 
This  goal  was  met  with  the  development  of  a  new  summary,  the  “persistence 
landscape”.  This  summary  is  stable,  does  not  lose  any  information,  has 
continuous  and  discrete  versions,  and  obeys  a  strong  law  of  large  numbers 
and  a  central  limit  theorem.  The  main  results  were  published  in  the  Jour¬ 
nal  of  Machine  Learning  Research  in  a  paper  titled  “Statistical  topological 
data  analysis  using  persistence  landscapes”  [4],  It  is  a  functional  summary 
which  may  be  viewed  as  a  point  in  a  vector  space  (or  more  precisely,  a  point 
in  a  Hilbert  space),  and  all  of  the  standard  tools  in  statistics  and  machine 
learning  are  available  for  subsequent  analysis.  For  example,  one  can  easily 
calculate  averages  and  differences,  and  apply  principal  component  analysis 
and  support  vector  machines,  or  feed  these  results  into  a  neural  network. 

The  secondary  goal  of  the  project  was  to  help  place  Topological  Data 
Analysis  on  a  firmer  mathematical  foundation,  strengthening  its  connections 
to  mathematics  and  making  it  easier  for  researchers  to  leverage  mathemati¬ 
cal  results  for  analyzing  complex  data  and  complex  networks.  This  goal  was 
met  with  the  publication  of  the  paper  (with  J.A.  Scott)  “Categorihcation  of 
persistent  homology”  [7]  in  the  journal  Discrete  and  Computational  Geome¬ 
try  and  the  paper  “Metrics  for  Generalized  Persistence  Modules”  (with  J.A. 
Scott  and  V.  de  Silva)  in  the  journal  Foundations  of  Computational  Math¬ 
ematics  [5].  These  papers  develop  a  very  general  framework  for  topological 
data  analysis. 


Extensions  of  original  goals 

With  the  main  goals  achieved  a  number  of  extensions  of  these  goals  were 
pursued. 

The  topological  summary,  the  “persistence  landscape”  developed  in  this 
project  was  validated  by  demonstrating  that  it  could  be  combined  with  statis¬ 
tical  inference  and  machine  learning  in  a  biological  application.  The  resulting 
paper,  “Using  persistent  homology  and  dynamical  distances  to  analyze  pro- 
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tein  binding”  (with  V.  Kovacev-Nikolic,  D.  Nikolic,  and  G.  Heo)  appeared  in 
Statistical  Applications  in  Genetics  and  Molecular  Biology  [18]. 

Together  with  P.  Dlotko,  the  PI  developed  efficient  algorithms  for  con¬ 
structing  the  persistence  landscape  and  for  combining  it  with  statistical  anal¬ 
ysis.  The  resulting  article,  “A  persistence  landscapes  toolbox  for  topological 
statistics”  is  in  the  Journal  of  Symbolic  Computation  [6]. 

Together  with  P.  Bendich,  the  PI  has  developed  a  framework  and  algo¬ 
rithm  for  stabilizing  the  location  of  topological  features  and  also  to  stabilize 
topological  computations  with  respect  to  choices  of  parameters.  The  pa¬ 
per  “Stabilizing  the  output  of  persistent  homology  computations”  has  been 
submitted  [3]. 

Together  with  V.  de  Silva  and  V.  Nanda,  the  PI  has  studied  the  geometry 
of  the  algebraic  objects  of  study  in  Topological  Data  Analysis.  The  result¬ 
ing  paper,  “Higher  interpolation  and  extension  of  persistence  modules”  is 
undergoing  peer  review. 


Impact  on  the  community 

The  main  contribution  of  this  project,  the  persistence  landscape,  is  perhaps 
the  most  influential  development  in  this  research  area  in  the  past  few  years. 
The  paper  [4]  has  inspired  considerable  theoretical  research  and  is  starting 
to  be  used  in  a  wide  variety  of  applications.  The  bootstrap  has  been  ap¬ 
plied  to  provide  confidence  bands  for  the  persistence  landscape  [11,  10].  The 
persistence  landscape  has  also  inspired  a  number  of  other  linear  topological 
summaries  [11,  22,  8,  23,  1,  19,  12,  13,  25,  2],  The  persistence  landscape 
has  been  used  to  study  brain  images  [24],  fluid  dynamics  [16],  brain  EEG 
data  [26],  complex  networks  [9],  and  phase  transitions  [17].  It  has  also  been 
recently  combined  with  Neural  Networks  to  study  audio  signals  [20].  Accord¬ 
ing  to  Google  Scholar  this  paper  has  already  been  cited  70  times. 

The  secondary  goal  of  the  project  was  the  development  of  a  general  frame¬ 
work  for  topological  data  analysis,  which  was  given  in  [7]  and  [5].  This  frame¬ 
work  has  already  been  used  by  other  researchers  to  develop  algorithms  and 
prove  properties  in  concrete  settings  such  as  those  for  Reeb  graphs  [14,  21]. 
According  the  Google  Scholar  these  papers  have  42  and  13  citations  respec¬ 
tively. 
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Software  development 

The  algorithms  for  persistence  landscapes  and  statistical  inference  presented 
in  [6]  have  been  implemented  as  “The  Persistence  Landscape  Toolbox”  and 
the  code  is  publicly  available  [15]. 

Dissemination  of  results 

During  this  project  the  PI  gave  26  invited  lectures  describing  its  results. 
The  venues  for  these  talks  included  research  institutes  and  leading  universi¬ 
ties  in  the  United  States,  Canada,  Mexico,  the  United  Kingdom,  Germany, 
Denmark,  Poland  and  Japan. 


Broader  impacts 

The  PI  has  made  efforts  to  increase  the  broader  impact  of  the  project.  He 
founded  and  serves  as  Director  of  the  Applied  Algebraic  Topology  Research 
Network,  a  network  with  close  to  300  members  that  is  funded  by  the  NSF 
through  the  Institute  of  Mathematics  and  its  Applications  (IMA).  He  also 
accepted  a  position  as  Associate  Editor  at  the  new  (Society  for  Industrial  and 
Applied  Mathematics)  SIAM  Journal  on  Applied  Algebra  and  Geometry.  In 
addition  he  was  a  member  of  the  Scientific  Committee  for  the  main  conference 
in  this  subject,  “Applied  Topology:  Computation,  Methods,  and  Science,” 
held  in  Turin,  Italy,  in  July  2016. 

The  PI  has  given  outreach  talks  at  the  NASA  Glenn  Research  Center’s 
Summer  Intern  Seminar,  and  to  the  University  of  Florida  Graduate  Mathe¬ 
matics  Association  Colloquium.  He  was  also  the  featured  speaker  at  a  Sum¬ 
mer  School  organized  by  the  Mathematical  Association  of  America  entitled 
“Big  Data  on  the  Great  Plains.” 

At  Cleveland  State  University  and  the  University  of  Florida  (UF)  he  has 
been  using  the  results  of  this  project  to  teach  undergraduate  and  graduate 
students.  At  UF,  he  has  incorporated  TDA  into  the  graduate  topology  course 
and  he  started  a  Student  Applied  Topology  seminar. 

At  UF,  the  PI  has  also  been  training  Highly  Qualified  Personnel.  These 
will  be  future  researchers  and  data  analysts  and  this  training  will  aid  the 
competitiveness  of  the  United  States.  He  is  advising  one  postdoctoral  re¬ 
searcher,  Dr.  Michael  Catanzaro,  two  Ph.D.  candidates,  Alexander  Wagner 
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and  Leo  Betthauser,  and  two  undergraduate  students,  Benjamin  Whittle  and 
Dhruv  Patel. 
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